Assignment 4 - 158.755 - S01 2021

James Bristow - 11189075
Martin Brenner - 20022611

Dolphin Photo-Identification

Identification of individual dolphins based on their dorsal fin

Abstract

In this report, we document how we identify individual dolphins based on their dorsal fin.

For this task, a system was designed that can take a file or directory, showing dorsal fins, as an input and produce a prediction of which individual can be seen on the photo. For the fin detection process, a Darknet YOLOv4 network was trained on the public NDD20 dolphin fin dataset. In a second step, each fin is automatically cropped from the image and put through some pre-processing to enhance its features. For the identification of the individual dolphin, a Triplet Loss Neural Network was configured. The network was trained on a cleaned and labelled data set of individual dolphin fins provided by NIWA marine researchers. According to these marine scientists, it may be possible to identify individual dolphins using their fin pigmentation patterns.
UMAP is then used to visualise the network embeddings, and clusters of individuals are high-lighted using HDBSCAN.
A software solution combining all steps was produced using the Flask web server and Vue.js front-end, which enables the user to upload a batch of photos which are then put through the identification process. The final software solution allows users to visualise the neural network embeddings, identify clusters and outliers, sort images by the estimated class label (the individual dolphin), and import and export datasets in csv and json format.

Introduction

The identification of individual dolphins is of relevance to researchers who want to distinguish and catalog them for study purposes. The task of matching individuals by visual comparison is time consuming, tedious, and prone to error. We propose a system that can aide in this identification process via a machine learning pipeline.

In this report, we first perform an exploratory data analysis of the NDD20 and pigmentation datasets. We consider individual images, and identify their most significant features. We perform extensive pre-processing on these images, and note the limitations of the provided datasets.

We've trained a YOLOv4 convolutional neural network on the NDD20 dataset using a Darknet architecture. Images were annotated using labelling. Bounding box data has been stored in the YOLO-format. We next used OpenCV load the trained Darknet model, make predictions, and automatically crop the images.

The FaceNet implementation of triplet loss was subsequently used to train a neural network for individual dolphin recognition. This neural network was trained on a catalogue of pre-labelled dolphin images. FaceNet was used as the backbone network, which we connected with the Keras framework for use on our own project.

Next, we deployed these algorithms on a web application via a Docker stack. We used a Flask webserver for the backend application logic, and a Vue.js frontend for the graphical user interface. Nginx was used as a proxy webserver to improve the theoretical scalability of the application, by facilitating load-balancing and caching. Redis was used as a caching system, message broker, and a job queue. Celery workers were used to asynchronously perform the more computationally expensive machine learning tasks without disrupting the performance of the application.

The project's GitHub repository can be found here: https://github.com/JBris/dolphin_segmentation

1. Exploratory Data Analysis

For the project two public datasets were used: 1) NDD20: https://arxiv.org/abs/2005.13359 for the detection
2) NIWA marine researchers: Provided fin images for the identification process https://niwa.co.nz/news/know-your-dolphin-by-the-fin-says-niwa-scientist

The two datasets used came labelled and there was only a small conversion from the COCO to YOLO data format, required to use NDD20 with the YOLO network.
The pigmentation dataset was provided with a folder structure with background cleaned images of fins in directories for each individual.

Fig.1.1 shows a sample of the images used for the fin detection. The bounding box coordinates for the fin(s) were included.
Fig.1.2 shows one of the images provided for identification. As the image was converted to three channels (RGB) it became apparent that some of the images still had the background pixels present which were just hidden behind an alpha transparency mask in the png file. This would cause issues as the network which will be trained for the identification only uses three channels of the image. Because of this a pre-processing step to remove the pixels is required. Fig.1.3. shows the image including the alpha channel.

The NDD20 dataset contains 2201 images - all jpg files.

Next, we will explore the pigmentation dataset.

There are 3746 images within the pigmentation dataset.

There are different 186 dolphin pigmentation classes

We can see that the pigmentation dataset is composed of png files.

We can see above the top 5 classes.

It appears that the classes are not well-balanced which could be problematic for the accuracy.

2. Fin detection using YOLO

YOLO was chosen as it is one of the most advanced single shot decoders for object detection available at the time.
The data was prepared in a directory including the required bounding box definitions.

2.1 Training the model

Check for the files:

Next, the files will be shuffled and split into the training and testing set. An 85-15 split is used. From the DarkNet documentation, we want around at least 2000 training objects per class.

The lists of train and test file names will be written to train.txt and test.txt respectively.

DarkNet needs to be compiled, and GPU support must be enabled. The below commands have been executed in a Google COLAB environment and can not be run locally unless an environment is prepared. This is why it is not presented as executable code.

See 4.1_fin_detection_training; 4.2_fin_detection and segmentation_yolov4; and 4.3_fin_detection_metrics for the Colab notebooks.

Some metrics for the final object detector are shown below.

%cd /proj/MyDrive/DolphinClassification/neural_nets/darknet/ !sed -i 's/OPENCV=0/OPENCV=1/' Makefile !sed -i 's/GPU=0/GPU=1/' Makefile !sed -i 's/CUDNN=0/CUDNN=1/' Makefile !sed -i 's/CUDNN_HALF=0/CUDNN_HALF=1/' Makefile !sed -i 's/LIBSO=0/LIBSO=1/' Makefile

!make

%env IMAGES_DIR=/content/NDD20 %env MODEL_DIR=/proj/MyDrive/DolphinClassification/models/darknet

Next, we'll begin the training process.

! chmod +x ./darknet ! chmod +rwx /proj/MyDrive/DolphinClassification/models/darknet/training ! ./darknet detector train ${IMAGES_DIR}/obj.data ${MODEL_DIR}/yolov4-dolphin.cfg ${MODEL_DIR}/yolov4.conv.137 -dont_show -map ! ./darknet detector train ${IMAGES_DIR}/obj.data ${MODEL_DIR}/yolov4-dolphin.cfg ${MODEL_DIR}/training/yolov4-dolphin_last.weights -dont_show -map


CUDA-version: 11000 (11020), cuDNN: 7.6.5, CUDNN_HALF=1, GPU count: 1
CUDNN_HALF=1 OpenCV version: 3.2.0 Prepare additional network for mAP calculation... 0 : compute_capability = 700, cudnn_half = 1, GPU: Tesla V100-SXM2-16GB net.optimized_memory = 0 mini_batch = 1, batch = 32, time_steps = 1, train = 0 layer filters size/strd(dil) input output 0 Create CUDA-stream - 0 Create cudnn-handle 0 conv 32 3 x 3/ 1 608 x 608 x 3 -> 608 x 608 x 32 0.639 BF 1 conv 64 3 x 3/ 2 608 x 608 x 32 -> 304 x 304 x 64 3.407 BF 2 conv 64 1 x 1/ 1 304 x 304 x 64 -> 304 x 304 x 64 0.757 BF 3 route 1 -> 304 x 304 x 64 4 conv 64 1 x 1/ 1 304 x 304 x 64 -> 304 x 304 x 64 0.757 BF 5 conv 32 1 x 1/ 1 304 x 304 x 64 -> 304 x 304 x 32 0.379 BF 6 conv 64 3 x 3/ 1 304 x 304 x 32 -> 304 x 304 x 64 3.407 BF

...

detections_count = 3177, unique_truth_count = 2763
class_id = 0, name = dolphin, ap = 98.83% (TP = 2735, FP = 67)

for conf_thresh = 0.10, precision = 0.98, recall = 0.99, F1-score = 0.98 for conf_thresh = 0.10, TP = 2735, FP = 67, FN = 28, average IoU = 77.02 %

IoU threshold = 50 %, used Area-Under-Curve for each unique Recall mean average precision (mAP@0.50) = 0.988323, or 98.83 % Total Detection Time: 243 Seconds</b> </code>